Skip to main content

Neural Networks

Feedforward Neural Networks

A feedforward neural network is described by a directed acyclic graph, G=G= (V,E)(V, E), and a weight function over the edges, w:ERw: E \rightarrow \mathbb{R}. Nodes of the graph correspond to neurons. Each single neuron is modeled as a simple scalar function, σ:RR\sigma: \mathbb{R} \rightarrow \mathbb{R}. We will focus on three possible functions for σ\sigma : the sign function, σ(a)=sign(a)\sigma(a)=\operatorname{sign}(a), the threshold function, σ(a)=1[a>0]\sigma(a)=\mathbb{1}_{[a>0]}, and the sigmoid function, σ(a)=1/(1+exp(a))\sigma(a)=1 /(1+\exp (-a)), which is a smooth approximation to the threshold function. We call σ\sigma the "activation" function of the neuron. Each edge in the graph links the output of some neuron to the input of another neuron. The input of a neuron is obtained by taking a weighted sum of the outputs of all the neurons connected to it, where the weighting is according to ww.

To simplify the description of the calculation performed by the network, we further assume that the network is organized in layers. That is, the set of nodes can be decomposed into a union of (nonempty) disjoint subsets, V=t=0TVtV=\cup_{t=0}^T V_t, such that every edge in EE connects some node in Vt1V_{t-1} to some node in VtV_t, for some t[T]t \in[T]. The bottom layer, V0V_0, is called the input layer. It contains n+1n+1 neurons, where nn is the dimensionality of the input space. For every i[n]i \in[n], the output of neuron ii in V0V_0 is simply xix_i. The last neuron in V0V_0 is the "constant" neuron, which always outputs 1 . We denote by vt,iv_{t, i} the ii th neuron of the tt th layer and by ot,i(x)o_{t, i}(\mathbf{x}) the output of vt,iv_{t, i} when the network is fed with the input vector x\mathbf{x}. Therefore, for i[n]i \in[n] we have o0,i(x)=xio_{0, i}(\mathbf{x})=x_i and for i=n+1i=n+1 we have o0,i(x)=1o_{0, i}(\mathbf{x})=1. We now proceed with the calculation in a layer by layer manner. Suppose we have calculated the outputs of the neurons at layer tt. Then, we can calculate the outputs of the neurons at layer t+1t+1 as follows. Fix some vt+1,jVt+1v_{t+1, j} \in V_{t+1}. Let at+1,j(x)a_{t+1, j}(\mathbf{x}) denote the input to vt+1,jv_{t+1, j} when the network is fed with the input vector x\mathbf{x}. Then,

at+1,j(x)=r:(vt,r,vt+1,j)Ew((vt,r,vt+1,j))ot,r(x)a_{t+1, j}(\mathbf{x})=\sum_{r:\left(v_{t, r}, v_{t+1, j}\right) \in E} w\left(\left(v_{t, r}, v_{t+1, j}\right)\right) o_{t, r}(\mathbf{x})

and

ot+1,j(x)=σ(at+1,j(x)).o_{t+1, j}(\mathbf{x})=\sigma\left(a_{t+1, j}(\mathbf{x})\right) .

That is, the input to vt+1,jv_{t+1, j} is a weighted sum of the outputs of the neurons in VtV_t that are connected to vt+1,jv_{t+1, j}, where weighting is according to ww, and the output of vt+1,jv_{t+1, j} is simply the application of the activation function σ\sigma on its input. Layers V1,,VT1V_1, \ldots, V_{T-1} are often called hidden layers. The top layer, VTV_T, is called the output layer. In simple prediction problems the output layer contains a single neuron whose output is the output of the network.

We refer to TT as the number of layers in the network (excluding V0V_0 ), or the "depth" of the network. The size of the network is V|V|. The "width" of the network is maxtVt\max _t\left|V_t\right|. An illustration of a layered feedforward neural network of depth 2 , size 10 , and width 5 , is given in the following. Note that there is a neuron in the hidden layer that has no incoming edges. This neuron will output the constant σ(0)\sigma(0).

SGD and Backpropagation